data tree
Learning Data Dependency with Communication Cost
Jang, Hyeryung, Song, HyungSeok, Yi, Yung
In this paper, we consider the problem of recovering a graph that represents the statistical data dependency among nodes for a set of data samples generated by nodes, which provides the basic structure to perform an inference task, such as MAP (maximum a posteriori). This problem is referred to as structure learning. When nodes are spatially separated in different locations, running an inference algorithm requires a non-negligible amount of message passing, incurring some communication cost. We inevitably have the trade-off between the accuracy of structure learning and the cost we need to pay to perform a given message-passing based inference task because the learnt edge structures of data dependency and physical connectivity graph are often highly different. In this paper, we formalize this trade-off in an optimization problem which outputs the data dependency graph that jointly considers learning accuracy and message-passing costs. We focus on a distributed MAP as the target inference task, and consider two different implementations, ASYNC-MAP and SYNC-MAP that have different message-passing mechanisms and thus different cost structures. In ASYNC- MAP, we propose a polynomial time learning algorithm that is optimal, motivated by the problem of finding a maximum weight spanning tree. In SYNC-MAP, we first prove that it is NP-hard and propose a greedy heuristic. For both implementations, we then quantify how the probability that the resulting data graphs from those learning algorithms differ from the ideal data graph decays as the number of data samples grows, using the large deviation principle, where the decaying rate is characterized by some topological structures of both original data dependency and physical connectivity graphs as well as the degree of the trade-off. We validate our theoretical findings through extensive simulations, which confirms that it has a good match.
Bisimulations on Data Graphs
Abriola, Sergio, Barceló, Pablo, Figueira, Diego, Figueira, Santiago
Bisimulation provides structural conditions to characterize indistinguishability from an external observer between nodes on labeled graphs. It is a fundamental notion used in many areas, such as verification, graph-structured databases, and constraint satisfaction. However, several current applications use graphs where nodes also contain data (the so called "data graphs"), and where observers can test for equality or inequality of data values (e.g., asking the attribute 'name' of a node to be different from that of all its neighbors). The present work constitutes a first investigation of "data aware" bisimulations on data graphs. We study the problem of computing such bisimulations, based on the observational indistinguishability for XPath ---a language that extends modal logics like PDL with tests for data equality--- with and without transitive closure operators. We show that in general the problem is PSpace-complete, but identify several restrictions that yield better complexity bounds (coNP, PTime) by controlling suitable parameters of the problem, namely the amount of non-locality allowed, and the class of models considered (graphs, DAGs, trees). In particular, this analysis yields a hierarchy of tractable fragments.
Bisimulations on Data Graphs
Abriola, Sergio (Universidad de Buenos Aires) | Barceló, Pablo (Center for Semantic Web Research and University of Chile) | Figueira, Diego (Centre National de la Recherche Scientifique (CNRS)) | Figueira, Santiago (Universidad de Buenos Aires and Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET))
Bisimulation provides structural conditions to characterize indistinguishability between nodes on graph-like structures from an external observer. It is a fundamental notion used in many areas. However, many applications use graphs where nodes have data, and where observers can test for equality or inequality of data values (e.g., asking the attribute "name" of a node to be different from that of all its neighbors). The present work constitutes a first investigation of "data aware"' bisimulations on data graphs. We study the problem of computing such bisimulations, based on the observational indistinguishability for XPath — a language that extends modal logic with tests for data equality. We show that in general the problem is pspace-complete, but identify several restrictions that yield better complexity bounds (coNP, ptime) by controlling suitable parameters of the problem; namely, the amount of em non-locality allowed, and the class of models considered (graph, DAG, tree). In particular, this analysis yields a hierarchy of tractable fragments.
Model Theory of XPath on Data Trees. Part I: Bisimulation and Characterization
Figueira, Diego, Figueira, Santiago, Areces, Carlos
We investigate model theoretic properties of XPath with data (in)equality tests over the class of data trees, i.e., the class of trees where each node contains a label from a finite alphabet and a data value from an infinite domain. We provide notions of (bi)simulations for XPpath logics containing the child, parent, ancestor and descendant axes to navigate the tree. We show that these notions precisely characterize the equivalence relation associated with each logic. We study formula complexity measures consisting of the number of nested axes and nested subformulas in a formula; these notions are akin to the notion of quantifier rank in first-order logic. We show characterization results for fine grained notions of equivalence and (bi)simulation that take into account these complexity measures. We also prove that positive fragments of these logics correspond to the formulas preserved under (non-symmetric) simulations. We show that the logic including the child axis is equivalent to the fragment of first-order logic invariant under the corresponding notion of bisimulation. If upward navigation is allowed the characterization fails but a weaker result can still be established. These results hold both over the class of possibly infinite data trees and over the class of finite data trees. Besides their intrinsic theoretical value, we argue that bi-simulations are useful tools to prove (non)expressivity results for the logics studied here, and we substantiate this claim with examples.
Conjunctive Queries over Trees
Gottlob, Georg, Koch, Christoph, Schulz, Klaus U.
We study the complexity and expressive power of conjunctive queries over unranked labeled trees represented using a variety of structure relations such as ``child'', ``descendant'', and ``following'' as well as unary relations for node labels. We establish a framework for characterizing structures representing trees for which conjunctive queries can be evaluated efficiently. Then we completely chart the tractability frontier of the problem and establish a dichotomy theorem for our axis relations, i.e., we find all subset-maximal sets of axes for which query evaluation is in polynomial time and show that for all other cases, query evaluation is NP-complete. All polynomial-time results are obtained immediately using the proof techniques from our framework. Finally, we study the expressiveness of conjunctive queries over trees and show that for each conjunctive query, there is an equivalent acyclic positive query (i.e., a set of acyclic conjunctive queries), but that in general this query is not of polynomial size.